智能论文笔记

Pixel Relationships-based Regularizer for Retinal Vessel Image Segmentation

Lukman Hakim , Takio Kurita

分类：计算机视觉 | 机器学习

2022-12-28

The task of image segmentation is to classify each pixel in the image based on the appropriate label. Various deep learning approaches have been proposed for image segmentation that offers high accuracy and deep architecture. However, the deep learning technique uses a pixel-wise loss function for the training process. Using pixel-wise loss neglected the pixel neighbor relationships in the network learning process. The neighboring relationship of the pixels is essential information in the image. Utilizing neighboring pixel information provides an advantage over using only pixel-to-pixel information. This study presents regularizers to give the pixel neighbor relationship information to the learning process. The regularizers are constructed by the graph theory approach and topology approach: By graph theory approach, graph Laplacian is used to utilize the smoothness of segmented images based on output images and ground-truth images. By topology approach, Euler characteristic is used to identify and minimize the number of isolated objects on segmented images. Experiments show that our scheme successfully captures pixel neighbor relations and improves the performance of the convolutional neural network better than the baseline without a regularization term.

translated by 谷歌翻译

Single-Image Super-Resolution Reconstruction based on the Differences of Neighboring Pixels

Huipeng Zheng , Lukman Hakim , Takio Kurita , Junichi Miyao

分类：计算机视觉 | 机器学习

2022-12-28

The deep learning technique was used to increase the performance of single image super-resolution (SISR). However, most existing CNN-based SISR approaches primarily focus on establishing deeper or larger networks to extract more significant high-level features. Usually, the pixel-level loss between the target high-resolution image and the estimated image is used, but the neighbor relations between pixels in the image are seldom used. On the other hand, according to observations, a pixel's neighbor relationship contains rich information about the spatial structure, local context, and structural knowledge. Based on this fact, in this paper, we utilize pixel's neighbor relationships in a different perspective, and we propose the differences of neighboring pixels to regularize the CNN by constructing a graph from the estimated image and the ground-truth image. The proposed method outperforms the state-of-the-art methods in terms of quantitative and qualitative evaluation of the benchmark datasets. Keywords: Super-resolution, Convolutional Neural Networks, Deep Learning

translated by 谷歌翻译

Weakly-Supervised Action Localization and Action Recognition using Global-Local Attention of 3D CNN

Novanto Yudistira , Muthu Subash Kavitha , Takio Kurita

分类：计算机视觉 | 人工智能 | 神经与进化计算

2020-12-17

3D卷积神经网络（3D CNN）在诸如视频序列之类的3D数据中捕获空间和时间信息。然而，由于卷积和汇集机制，信息损失似乎是不可避免的。为了改善3D CNN的视觉解释和分类，我们提出了两种方法; i）使用培训的3dresnext网络聚合到本地（全局 - 本地）离散梯度的层面全局，II）实施注意门控网络以提高动作识别的准确性。拟议的方法打算通过视觉归因，弱监督行动本地化和行动识别，显示各层在3D CNN中被称为全球局部关注的有用性。首先，使用关于最大预测类的BackPropagation培训3dresnext培训并应用于动作分类。然后将每层的梯度和激活取样。稍后，聚合用于产生更细致的注意力，指出了预测类输入视频的最关键部分。我们使用最终关注的轮廓阈值为最终的本地化。我们使用3DCAM使用细粒度的视觉解释来评估修剪视频中的空间和时间动作定位。实验结果表明，该拟议方法产生了丰富的视觉解释和歧视性的关注。此外，通过每个层上的注意栅格的动作识别产生比基线模型更好的分类结果。

translated by 谷歌翻译

Simultaneous Acquisition of High Quality RGB Image and Polarization Information using a Sparse Polarization Sensor

Teppei Kurita , Yuhi Kondo , Legong Sun , Yusuke Moriuchi

分类：计算机视觉

2022-09-27

本文提出了一种新型的极化传感器结构和网络结构，以获得高质量的RGB图像和极化信息。常规的极化传感器可以同时获取RGB图像和极化信息，但是传感器上的极化器会降低RGB图像的质量。 RGB图像的质量与极化信息之间存在权衡，因为较少的极化像素减少了RGB图像的降解，但减少了极化信息的分辨率。因此，我们提出了一种方法，该方法通过在传感器上稀疏排列极化像素来解决权衡，并使用RGB图像作为指导来补偿以更高分辨率的低分辨率极化信息。我们提出的网络体系结构由RGB图像改进网络和两极分化信息补偿网络组成。我们通过将其性能与最先进的方法进行比较，确认了我们提出的网络在补偿极化强度的差异成分方面的优势：深度完成。此外，我们确认我们的方法可以同时获得更高质量的RGB图像和极化信息，而不是传统的极化传感器，从而解决了RGB图像质量和极化信息之间的权衡。基线代码以及新生成的真实和合成的大规模极化图像数据集可用于进一步的研究和开发。

translated by 谷歌翻译

Visual Recipe Flow: A Dataset for Learning Visual State Changes of Objects with Recipe Flows

Keisuke Shirai , Atsushi Hashimoto , Taichi Nishimura , Hirotaka Kameko , Shuhei Kurita , Yoshitaka Ushiku , Shinsuke Mori

分类：自然语言处理 | 人工智能

2022-09-13

我们提出了一个名为“ Visual配方流”的新的多模式数据集，使我们能够学习每个烹饪动作的结果。数据集由对象状态变化和配方文本的工作流程组成。状态变化表示为图像对，而工作流则表示为食谱流图（R-FG）。图像对接地在R-FG中，该R-FG提供了交叉模式关系。使用我们的数据集，可以尝试从多模式常识推理和程序文本生成来尝试一系列应用程序。

translated by 谷歌翻译

ScanQA: 3D Question Answering for Spatial Scene Understanding

Daichi Azuma , Taiki Miyanishi , Shuhei Kurita , Motoki Kawanabe

分类：计算机视觉

2021-12-20

我们提出了一项新的3D问题答案的3D空间理解任务（3D-QA）。在3D-QA任务中，模型从丰富的RGB-D室内扫描的整个3D场景接收视觉信息，并回答关于3D场景的给定文本问题。与VQA的2D答案不同，传统的2D-QA模型遭受了对对象对齐和方向的空间理解的问题，并且从3D-QA中的文本问题中失败了对象本地化。我们为3D-QA提出了一个名为ScanQA模型的3D-QA基线模型，其中模型从3D对象提案和编码的句子嵌入中获取融合描述符。该学习描述符将语言表达式与3D扫描的底层几何特征相关联，并促进3D边界框的回归以确定文本问题中的描述对象。我们收集了人类编辑的问题答案对，自由表格答案将接地为3D场景中的3D对象。我们的新ScanQA数据集包含来自Scannet DataSet的800个室内场景的超过41K问答对。据我们所知，ScanQA是第一个在3D环境中执行对象接地的问答的大规模工作。

translated by 谷歌翻译

Computing Diverse Shortest Paths Efficiently: A Theoretical and Experimental Study

Tesshu Hanaka , Yasuaki Kobayashi , Kazuhiro Kurita , See Woo Lee , Yota Otachi

分类：人工智能

2021-12-10

最近在组合问题中寻找多样化的解决方案，最近受到了相当大的关注（Baste等人2020; Fomin等人2020; Hanaka等。2021）。在本文中，我们研究了以下类型的问题：给出了整数$ k $，问题询问了$ k $解决方案，使得这些解决方案之间的成对和汉明距离的总和最大化。这种解决方案称为各种解决方案。我们介绍了一种用于查找加权定向图中的多样性最短$ ST $ -Paths的多项式时间算法。此外，我们研究了其他经典组合问题的多样化版本，如不同的加权麦芽碱，不同加权树丛和多样化的双链匹配。我们表明这些问题也可以在多项式时间内解决。为了评估我们寻找多样性最短$ ST $ ST -Paths的算法的实际表现，我们进行了合成和现实世界的计算实验。实验表明，我们的算法在合理的计算时间内成功计算了各种解决方案。

translated by 谷歌翻译